Search CORE

176 research outputs found

A 20 MHz CMOS reorder buffer for a superscalar microprocessor

Author: Bagherzadeh Nader
Lenell John
Wallace Steve
Publication venue
Publication date
Field of study

Superscalar processors can achieve increased performance by issuing instructions out-of-order from the original sequential instruction stream. Implementing an out-of-order instruction issue policy requires a hardware mechanism to prevent incorrectly executed instructions from updating register values. A reorder buffer can be used to allow a superscalar processor to issue instructions out-of-order and maintain program correctness. This paper describes the design and implementation of a 20MHz CMOS reorder buffer for superscalar processors. The reorder buffer is designed to accept and retire two instructions per cycle. A full-custom layout in 1.2 micron has been implemented, measuring 1.1058 mm by 1.3542 mm

NASA Technical Reports Server

Energy and performance-aware application mapping for inhomogeneous 3D networks-on-chip

Author: Ahmadinia Ali
Bagherzadeh Nader
Opoku Agyeman Michael
Publication venue: 'Elsevier BV'
Publication date: 07/08/2018
Field of study

Three dimensional Networks-on-Chip (3D NoCs) have evolved as an ideal solution to the communication demands and complexity of future high density many core architectures. However, the design practicality of 3D NoCs faces several challenges such as thermal issues, high power consumption and area overhead of 3D routers as well as high complexity and cost of vertical link implementation. To mitigate the performance and manufacturing cost of 3D NoCs, inhomogeneous architectures have emerged to combine 2D and 3D routers in 3D NoCs producing lower area and energy consumption while maintaining the performance of homogeneous 3D NoCs. Due to the limited number of vertical links, application mapping on inhomogeneous 3D NoCs can be complex. However, application mapping has a great impact on the performance and energy consumption of NoCs. This paper presents an energy and performance aware application mapping algorithm for inhomogeneous 3D NoCs. The algorithm has been evaluated with various realistic traffic patterns and compared with existing mapping algorithms. Experimental results show NoCs mapped with the proposed algorithm have lower energy consumption and significant reduction in packet delays compared to the existing algorithms and comparable average packet latency with Branch-and-Bound

University of Northampton's Research Explorer

NECTAR

Performance and Energy Aware Inhomogeneous 3D Networks-on-Chip Architecture Generation

Author: Ahmadinia Ali
Bagherzadeh Nader
Opoku Agyeman Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/07/2015
Field of study

University of Northampton's Research Explorer

Hybrid U-Net: Semantic Segmentation of High-Resolution Satellite Images to Detect War Destruction

Author: Bagherzadeh Nader
Harding Matthew
Hersh Jonathan
Nabiee Shima
Publication venue: Chapman University Digital Commons
Publication date: 09/07/2022
Field of study

Destruction caused by violent conflicts play a big role in understanding the dynamics and consequences of conflicts, which is now the focus of a large body of ongoing literature in economics and political science. However, existing data on conflict largely come from news or eyewitness reports, which makes it incomplete, potentially unreliable, and biased for ongoing conflicts. Using satellite images and deep learning techniques, we can automatically extract objective information on violent events. To automate this process, we created a dataset of high-resolution satellite images of Syria and manually annotated the destroyed areas pixel-wise. Then, we used this dataset to train and test semantic segmentation networks to detect building damage of various size. We specifically utilized a U-Net model for this task due to its promising performance on small and imbalanced datasets. However, the raw U-Net architecture does not fully exploit multi-scale feature maps, which are among the important factors for generating fine-grained segmentation maps, especially for high-resolution images. To address this deficiency, we propose a multi-scale feature fusion approach and design a multi-scale skip-connected Hybrid U-Net for segmenting high-resolution satellite images. In our experiments, U-Net and its variants demonstrated promising segmentation results to detect various war-related building destruction. In addition, Hybrid U-Net resulted in significant improvement in segmentation performance compared to U-Net and other baselines. In particular, the mean intersection over union and mean dice score improved by 7.05% and 8.09%, respectively, compared to those in the raw U-Net

Chapman University Digital Commons

The Effects of Approximate Multiplication on Convolutional Neural Networks

Author: Bagherzadeh Nader
Del Barrio Alberto A.
Kim HyunJin
Kim Min Soo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/01/2021
Field of study

This paper analyzes the effects of approximate multiplication when performing inferences on deep convolutional neural networks (CNNs). The approximate multiplication can reduce the cost of the underlying circuits so that CNN inferences can be performed more efficiently in hardware accelerators. The study identifies the critical factors in the convolution, fully-connected, and batch normalization layers that allow more accurate CNN predictions despite the errors from approximate multiplication. The same factors also provide an arithmetic explanation of why bfloat16 multiplication performs well on CNNs. The experiments are performed with recognized network architectures to show that the approximate multipliers can produce predictions that are nearly as accurate as the FP32 references, without additional training. For example, the ResNet and Inception-v4 models with Mitch-

w

6 multiplication produces Top-5 errors that are within 0.2% compared to the FP32 references. A brief cost comparison of Mitch-

w

6 against bfloat16 is presented, where a MAC operation saves up to 80% of energy compared to the bfloat16 arithmetic. The most far-reaching contribution of this paper is the analytical justification that multiplications can be approximated while additions need to be exact in CNN MAC operations.Comment: 12 pages, 11 figures, 4 tables, accepted for publication in the IEEE Transactions on Emerging Topics in Computin

arXiv.org e-Print Archive

eScholarship - University of California

Mapping and Scheduling in Heterogeneous NoC through Population-Based Incremental Learning

Author: Aedo Cobo José Edinson
Bagherzadeh Nader
Bolaños Martínez Freddy
Rivera Vélez Fredy Alexander
Publication venue: 'Austrian Veterinary Society'
Publication date: 01/01/2012
Field of study

ABSTRACT: Network-on-Chip (NoC) is a growing and promising communication paradigm for Multiprocessor-System-On-Chip (MPSoC) design, because of its scalability and performance features. In designing such systems, mapping and scheduling are becoming critical stages, because of the increase of both size of the network and application’s complexity. Some reported solutions solve each issue independently. However, a conjoint approach for solving mapping and scheduling allows to take into account both computation and communication objectives simultaneously. This paper shows a mapping and scheduling solution, which is based on a Population-Based Incremental Learning (PBIL) algorithm. The simulation results suggest that our PBIL approach is able to find optimal mapping and scheduling, in a multi-objective fashion. A 2-D heterogeneous mesh was used as target architecture for implementation, although the PBIL representation is suited to deal with more complex architectures, such as 3-D meshes

Biblioteca Digital del Sistema de Bibliotecas de la Universidad de Antioquia

Self-optimized Routing in a Networkon-a-Chip

Author: Jun Ho Bahn
Nader Bagherzadeh
Sebastian Schlingmann
Theo Ungerer
Wolfgang Trumler
Publication venue
Publication date: 01/01/2008
Field of study

Abstract Many-cores are on the cusp of becoming state-of-the-art processor technology for the next decade. To guarantee efficient communication between multiple cores, a Network-on-a-Chip (NoC) is considered as an alternative to overcome the limitations of the ubiquitous bus technology. In this paper, we present an approach to further improve the routing in an NoC with a self-optimized routing strategy. We extended the routers of a network to measure their load and to send an appropriate load information to their direct neighbors. The load information is used to decide in which direction a packet should be routed to avoid hot-spots. Evaluation results show a significant increase in the network throughput. With the self-optimized routing, the NoC is capable of routing up to two times more packets compared to the original routing algorithm proposed b

CiteSeerX